Bioinformatics (Thomas Dandekar, Meik Kunz)

8.1

When Does It Become a Challenge for the Computer?

However, the moment we recognise that this may be the unavoidable limitation of our

computational approach and that we naturally systematically do not take these imponder

able, qualitative aspects into account in our bioinformatic models, we are already a signifi

cant step further. So let’s keep in mind: bioinformatics tries to describe biological reality

in clear, transparent models, for example how a normal cell becomes a cancer cell. By

using the computer and experimental data, I make myself blind in terms of experience and

other direct interactions with nature, but I have the undeniable advantage of having quan

titative statements about the biological process through numbers and measures (“give

numbers to the arrows” is what Leroy Hood once called it). This alone, through these

quantitative glasses, prevents being drowned in too much unprovable theory. For example,

the model predicts that 80% of cancer cells will die from treatment. We can simply mea

sure in experiments how far this is true. This also brings up another important implication

for all bioinformatics analyses. For example, if we have found a related sequence to a

sequence of which we know more about the function by sequence comparison, then we

should continue this chain of reasoning (from sequence comparison to sequence compari

son) until we have a clear experiment on the last sequence that biochemically or molecu

larly confirms the function of the protein associated with the sequence. Only then do we

have solid ground within the framework of our model.

So much for the bioinformatic model, which should therefore always base its own cal

culations on solid, experimental data. Now a few sentences about the calculations: After

all, it could be that these calculations take a very long time, and anyone who has ever

“kicked” his computer with such a complicated, lengthy calculation knows the problem of

wondering, “When will this limited computational box finally stop calculating?” The

problems where this is unresolved are called NP problems (NP stands for nondeterministic

polynomial time). There is no simple formula (a polynomial) that allows one to calculate

how long the computer will compute based on the length of the input. Unfortunately, most

biologically exciting problems are such NP problems. This is because biomolecules and

all higher processes in the cell are usually modular, made up of similar or identical units

(see Part 1). Thus, the addition of only one further unit leads to a multiple increase in

computation time, and such combinatorial problems (“combinatorial explosion”) there

fore almost always occur in our biological modelling. This leads to corresponding uncer

tainties in the computation time. However, one can help oneself with fixed specifications,

so-called “stopping criteria“, i.e. stopping specifications for the computer, e.g. “please

stop after one hour of computing time”. But more important is the fact that with a fixed

calculation time it is not possible to estimate how good the solution found up to that point

is in comparison to the best or optimal solution. But that’s just life: Not so easy to grasp!

2014

8.1

8 When Does the Computer Stop Calculating?